Granularity Adaptive Density Estimation and on Demand Clustering of Concept-Drifting Data Streams

نویسندگان

  • Weiheng Zhu
  • Jian Pei
  • Jian Yin
  • Yihuang Xie
چکیده

Clustering data streams has found a few important applications. While many previous studies focus on clustering objects arriving in a data stream, in this paper, we consider the novel problem of on demand clustering concept drifting data streams. In order to characterize concept drifting data streams, we propose an effective method to estimate densities of data streams. One unique feature of our new method is that its granularity of estimation is adaptive to the available computation resource, which is critical for processing data streams of unpredictable input rates. Moreover, we can apply any clustering method to on demand cluster data streams using their density estimations. A performance study on synthetic data sets is reported to verify our design, which clearly shows that our method obtains results comparable to CluStream [3] on clustering single stream, and much better results than COD [8] when clustering multiple streams.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resource Constrained Data Stream Clustering with Concept Drifting for Processing Sensor Data

Wireless sensors and mobile devices have been widely deployed as data collecting devices for monitoring real world systems. A large amount of stream data is generated in real-time, which has to be processed in real-time as well. One of the common processing operations is clustering that automatically groups the elements of a stream into a number of clusters in general. Elements of the same clus...

متن کامل

Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams

Electricity is the main observation in our daily life. There are many parameters or a factors on which the electricity load is depends on, knowable load factors such as whether conditions, temporal factors, and customer characteristics etc. Daily peak load is an important factor in the planning the production and pricing of electricity. In a simple terms, it is essential to get the knowledge of...

متن کامل

Learning from Concept Drifting Data Streams with Unlabeled Data

Contrary to the previous beliefs that all arrived streaming data are labeled and the class labels are immediately available, we propose a Semi-supervised classification algorithm for data streams with concept drifts and UNlabeled data, called SUN. SUN is based on an evolved decision tree. In terms of deviation between history concept clusters and new ones generated by a developed clustering alg...

متن کامل

A Classification Algorithm for Noisy Data Streams with Concept-Drifting

Processing noise data is one of the most important fields on mining data streams. To address this problem, we consider a Density Based Spatial Clustering of Application with Noise (DBSCAN) algorithm, which takes advantage of filtering noise data to handle noise data in data streams. Many experiments show that DBSCAN algorithm will cost a lot of time when the database is large. Therefore we impr...

متن کامل

A New Framework for Data Streams Classification

Mining data streams has recently become an important and challenging task for a wide range of services, including credit card fraud detection, sensor networks and web applications. In these applications data do not typically take the form of persistent relations, but tend to arrive in multiple, continuous, rapid and timevarying data streams. Hence, conventional knowledge discovery tools cannot ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006